Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher.
Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?
Some links on this page may take you to non-federal websites. Their policies may differ from this site.
-
Data center operators generally overprovision IT and cooling capacities to address unexpected utilization increases that can violate service quality commitments. This results in energy wastage. To reduce this wastage, we introduce HCP (Holistic Capacity Provisioner), a service latency aware management system for dynamically provisioning the server and cooling capacity. Short-term load prediction is used to adjust the online server capacity to concentrate the workload onto the smallest possible set of online servers. Idling servers are completely turned off based on a separate long-term utilization predictor. HCP targets data centers that use chilled air cooling and varies the cooling provided commensurately, using adjustable aperture tiles and speed control of the blower fans in the air handler. An HCP prototype supporting a server heterogeneity is evaluated with real-world workload traces/requests and realizes up to 32% total energy savings while limiting the 99th-percentile and average latency increases to at most 6.67% and 3.24%, respectively, against a baseline system where all servers are kept online.more » « less
-
Proponents of AC-powered data centers have implicitly assumed that the electrical load presented to all three phases of an AC data center are balanced. To assure this, servers are connected to the AC power phases to present identical loads, assuming an uniform expected utilization level for each server. We present an experimental study that demonstrates that with the inevitable temporal changes in server workloads or with dynamic sever capacity management based on known daily load patterns, balanced electrical loading across all power phases cannot be maintained. Such imbalances introduce a reactive power component that represents an effective power loss and brings down the overall energy efficiency of the data center, thereby resulting in a handicap against DC-powered data centers where such a loss is absent.more » « less
-
The recent availability of water cooling systems that can be easily retrofitted to stock servers by replacing the heatsinks with coldplates has made it possible to use such systems for non-HPC cloud/data center servers. These cooling systems use pumps to circulate water and the pumps are likely to fail in the long run. We present a technique to handle flow disruptions caused by the pump failures in a virtualized environment. The solution uses an estimation of the residual cooling capacity left in the failed cooling system to adaptively adjust the CPU clock frequency as virtual machines are migrated off the racks affected by the failure. This minimizes the degradation of the tail latencies of the served requests during the migration interval for all servers affected by the failure, as seen in the experimental resultsmore » « less
-
Recent availability of warm water cooling systems that can be easily retrofitted to stock server by replacing the heatsinks with coldplates have made it possible to use such cooling for non-HPC cloud/data center servers. These cooling systems use internal pumps in rack-level heat exchangers as well as external pumps that can fail. We present a systematic study of the pump failures that disrupt flow in the cooling system, propose and experimentally evaluate techniques for reducing service disruptions during failures while avoiding damage to the servers where water cooling has failed.more » « less
-
Proponents of AC-powered data centers have implicitly assumed that the electrical load presented to all three phases of an AC data center are balanced. To assure this, servers are connected to the AC power phases to present identical loads, assuming an uniform expected utilization level for each server. We present an experimental study that demonstrates that with the inevitable temporal changes in server workloads or with dynamic sever capacity management based on known daily load patterns, balanced electrical loading across all power phases cannot be maintained. Such imbalances introduce a reactive power component that represents an effective power loss and brings down the overall energy efficiency of the data center, thereby resulting in a handicap against DC-powered data centers where such a loss is absent.more » « less
-
During the lifespan of a data center, power outages and blower cooling failures are common occurrences. Given that data centers have a vital role in modern life, it is especially important to understand these failures and their effects. A previous study [16] showed that cold aisle containment might have a negative impact on IT equipment uptime during a blower failure. This new study further analyzed the impact of containment on IT equipment uptime during a CRAH blower failure. It also compared the IT equipment performance both with and without a pressure relief mechanism implemented in the containment system. The results show that the effect of implementing pressure relief in containment solution on the IT equipment performance and response could vary and depend on the server's airflow, generation and hence types of servers deployed in cold aisle enclosure. The results also showed that when compared to the discrete sensors, the IPMI inlet temperature sensors underestimate the Ride Through Time (RTT) by 32%. This means that the RTT calculations based on the IPMI inlet sensors may be inaccurate due to variations in the sensor readings; as they exist today; in these servers. as discussed in a previous study [26]. Additionally, it was shown that all Dell PowerEdge 2950 servers have a similar IPMI inlet temperature reading, regardless of mounting location. As external system resistance increases during cooling failure, the servers exhibit internal recirculation through their weaker power supply fans, which is reflected in the high IPMI inlet temperature readings. For this server specifically, a pressure relief mechanism reduces the external resistance, thereby eliminating internal recirculation and resulting in lower IPMI inlet temperature readings. This in turn translates to a lower RTT. However, pressure relief showed conflicting results where the discrete sensors showed an increase in inlet temperature when pressure relief was introduced, thereby reducing the RTT. The CPU temperatures conformed with the discrete sensor data, indicating that containment helped increase the RTT of the servers during failure.more » « less
An official website of the United States government

Full Text Available